Brief survey of crowdsourcing for data mining

نویسندگان

  • Xintong Guo
  • Hongzhi Wang
  • Yangqiu Song
  • Gao Hong
چکیده

Crowdsourcing allows large-scale and flexible invocation of human input for data gathering and analysis, which introduces a new paradigm of data mining process. Traditional data mining methods often require the experts in analytic domains to annotate the data. However, it is expensive and usually takes a long time. Crowdsourcing enables the use of heterogeneous background knowledge from volunteers and distributes the annotation process to small portions of efforts from different contributions. This paper reviews the state-of-the-arts on the crowdsourcing for data mining in recent years. We first review the challenges and opportunities of data mining tasks using crowdsourcing, and summarize the framework of them. Then we highlight several exemplar works in each component of the framework, including question designing, data mining and quality control. Finally, we conclude the limitation of crowdsourcing for data mining and suggest related areas for future research. 2014 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

Crowdsourcing Based on Clustering

Crowdsourcing is an act of outsourcing tasks, traditionally performed by an employee or contractor, which are now performed by a large group of people. Recent survey deals with the problem of evaluating the submissions to crowdsourcing websites on which data is increasing rapidly in both volume and complexity. Thus, with an increasing number of submissions, the process of rate submissions, sele...

متن کامل

Automated detection of coronavirus disease (COVID-19) by using data-mining techniques: a brief report

Background: The clinical field has vast sick data that has not been analyzed. Discovering a way to analyze this raw data and turn it into an information treasure can save many lives. Using data mining methods is an efficient way to analyze this large amount of raw data. It can predict the future with accurate knowledge of the past, providing new insights into disease diagnosis and prevention. S...

متن کامل

Data Mining in R using Rattle

T‎his paper is a brief introduction to the concepts, methods ‎and ‎algorithms ‎for ‎data ‎mining ‎in ‎statistical ‎software R ‎using a‎ ‎package ‎named ‎Rattle. Rattle ‎provides a‎ ‎good ‎graphical ‎environment ‎to ‎perform ‎some ‎of ‎the ‎procedures ‎and ‎algorithms ‎without ‎the ‎need ‎for ‎programming. ‎Some ‎parts ‎of ‎the ‎package ‎will ‎be ‎explained ‎by a‎ ‎number ‎of ‎examples.‎ ‎ ...

متن کامل

Pattern-mining approach for conflating crowdsourcing road networks with POIs

Pattern-mining approach for conflating crowdsourcing road networks with POIs Bisheng Yang & Yunfei Zhang To cite this article: Bisheng Yang & Yunfei Zhang (2015) Pattern-mining approach for conflating crowdsourcing road networks with POIs, International Journal of Geographical Information Science, 29:5, 786-805, DOI: 10.1080/13658816.2014.997238 To link to this article: http://dx.doi.org/10.108...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2014